Section: New Results
Activity Detection in Long-term Untrimmed Videos by discovering sub-activities
Participants : Farhood Negin, Abhishek Goel, Abdelrahman G. Abubakr, Gianpiero Francesca, Francois Brémond.
Keywords: Activity detection, Semi-supervised learning, Sub-activity detection.
|
Detecting temporal delineation of activities is important to analyze large-scale videos. However, there are still challenges yet to be overcome in order to have an accurate temporal segmentation of activities. Detection of daily-living activities is even more challenging due to their high intra-class and low inter-class variations, complex temporal relationships of sub-activities performed in realistic settings. To tackle these problems, we propose an online activity detection framework based on the discovery of sub-activities. We consider a long-term activity as a sequence of short-term sub-activities. Our contributions can be summarized as follows:
-
We introduce a new online frame-level activity detection pipeline which uses single-sized window approach. A weakly supervised classifier is trained directly on sub-activities discovered by clustering and operates on test videos to capture sub-activities of long videos within a fixed temporal window.
-
To alleviate the noisy detections especially in activity boundaries, we propose a novel greedy post-processing method based on Markov models.
-
We have extensively evaluated our proposed method on untrimmed videos from DAHLIA [68] and GAADRD [77] datasets and achieved state-of-the-art performances.
Proposed Method:
Our framework produces frame-level activity labels in an online manner by two major steps followed by a novel greedy post-processing technique. In order to handle long activities, activities are decomposed into a sequence of fixed-length overlapping temporal clips. We then extract deep features from the clips. We suggested a person-centric feature (PC-CNN) based on SSD detector that satisfies required processing efficiency of online systems. We then proposed a weakly-supervised method for the discovery of sub-activities of long-term activities which benefits from clustering and model selection methods to find the optimal sub-activities of the given activities. In order to characterize each activity with constituent sub-activities, we use K-means to cluster that activity's clips and construct a specific sub-activity dictionary. Therefore, we have one sub-activity dictionary for each main activity. We represent an activity sequence with sub-activity assignments using the trained dictionary. Then, for each activity class, we train a binary SVM classifier (one versus all) based on its sub-activities (Figure 20). The trained classifiers are then simultaneously used to produce frame-level activity labels with the help of a sliding window architecture. It should be noticed that unlike multi-scale sliding window methods, we only use a single fixed-size temporal window thanks to recognition of fixed length sub-activities. Finally, assuming temporal progression of sub-activities, we developed a greedy algorithm based on Markov models to refine noisy sub-activity proposals in middle and boundary regions of long activities. We evaluated the proposed method on two daily-living activity datasets and achieved state-of-the-art performances.
ELS | Max Subgraph Search | DOHT (HOG) | Sub Activity | |||||||||
FA_1 | F_score | IoU | FA_1 | F_score | IoU | FA_1 | F_score | IoU | FA_1 | F_score | IoU | |
View 1 | 0.18 | 0.18 | 0.11 | - | 0.25 | 0.15 | 0.80 | 0.77 | 0.64 | 0.85 | 0.81 | 0.73 |
View 2 | 0.27 | 0.26 | 0.16 | - | 0.18 | 0.10 | 0.81 | 0.79 | 0.66 | 0.87 | 0.82 | 0.75 |
View 3 | 0.52 | 0.55 | 0.39 | - | 0.44 | 0.31 | 0.80 | 0.77 | 0.65 | 0.82 | 0.76 | 0.69 |
Method | FA_1 | F_score | IoU |
simple sliding window(HOG) | 0.68 | 0.52 | 0.40 |
simple sliding window(PC-CNN) | 0.61 | 0.55 | 0.44 |
Tables 1 and 2 show the results of applying the developed frameworks on DAHLIA and GAADRD respectively. It can be noticed that in DAHLIA dataset (compared to [71], [61], [60]), we significantly outperformed state-of-the-art results in all of the categories except in camera view 3 when the F-Score metric is used. We reported the results of GAADRD dataset with the two types of features HOG and PC-CNN. As it can be seen, even with hand-crafted features our framework produces comparable results. In future work, we are going to improve the sub-activity discovery algorithm by making it able to distinguish similar sub-activities in two different activities.